PNG (Portable Network Graphics) Specification, Sixth Draft

By:

Thomas Boutell, boutell@netcom.com
Mark Adler, madler@cco.caltech.edu
Lee Daniel Crocker, lcrocker@netcom.com

Permission granted to reproduce this specification in complete and unaltered form. Excerpts may be printed with the following notice: "excerpted from the PNG (Portable Network Graphics) specification by Thomas Boutell." No notice is required in software that follows this specification; notice is only required when reproducing or excerpting from the specification itself.

The author wishes to acknowledge the contributions of the New Graphics Format mailing list and the readers of comp.graphics. (Mr. Boutell is solely responsible for errors of fact or design in the PNG specification, however.)

This is the sixth draft of the PNG (formerly "PBF") specification discussion document, replacing all previous drafts. There are several significant changes from the previous drafts.

1. Rationale

The PNG format is intended to provide a portable, legally unencumbered, simple, lossless, streaming-capable, well-compressed, well-specified standard for bitmapped image files which gives new features to the end user at minimal cost to the developer.

It has been asked why the PNG format is not simply an extension of the GIF format. The short answer is that the GIF format is embroiled in legal disputes, does not support 24-bit images and lacks the option of an alpha channel.

It has been asked why the PNG format is not TIFF, or a subset of TIFF. The answer is that TIFF does not support a compression scheme that is not legally encumbered, and that a subset of TIFF would simply frustrate users making the reasonable assumption that a file saved as TIFF from Software XYZ will load into a program supporting our flavor of TIFF. Implementing full TIFF would violate the simplicity constraint.

It has been asked why the PNG format is not IFF, or a sub- or superset of IFF. The same concern applies as with TIFF: users with software that purports to generate IFF files will not be pleased when those files do not load in programs supporting the new specification. In addition, the IFF specification has rarely been accurately implemented and there is considerable disagreement among implementations. The IFF file structure could be used, but was not designed with streaming applications in mind; there are workarounds for this, but they are not widely implemented.

It has been asked why PNG does not include lossy compression. The answer is that JPEG already does an excellent job of lossy compression, and there is no reason to repeat that effort. Different tools, different jobs.

It has been asked why PNG uses network byte order. We have selected one byte ordering and used it consistently. Which order in particular is of little relevance, but network byte order has the advantage that routines to convert to and from it are already available on any platform that supports TCP/IP networking, including all PC platforms.

It has been asked why PNG does not directly support multiple images. It is expected that a metaformat will be created which permits multiple images and uses PNG-like data streams internally, with certain minimal alterations, such as the optional omission of palettes. In such a metaformat, the identifying bytes at the beginning must NOT be the same as for PNG.

PNG has been expressly designed not to be completely dependent on a single compression technique. Although inflate/deflate compression is mentioned in this document, PNG would still exist without it.

PNG supports an alpha channel as well as the transparency-index approach used in GIF. An alpha channel is much more flexible than a transparency index, whereas a transparency index compresses more efficiently.

3. Data Representation Note

Byte Order

All integers which are not 1 byte integers will be in network byte order, which is to say the most significant byte comes first, and the less significant bytes in descending order of significance (simply MSB LSB for two-byte integers, B3 B2 B1 B0 for 4-byte integers). References to bit 7 refer to the highest bit (128) of a byte; references to bit 0 refer to the lowest bit (1) of a byte.

Color Values

All color values range from zero (black) to most intense at the maximum value. The AGMA chunk specifies the gamma response of the source device, and viewers are strongly encouraged to properly compensate.

Pixel dimensions

Non-square pixels can be represented, but viewers are not required to account for them; see the APHY chunk.

4. The Format

The Identification Header

The first six bytes always contain the following values:

137 08 80 78 71 26

The first two bytes distinguish the file on systems that expect the first two bytes to identify the file, but also backspace to erase the first nonsense character, making the following text visible. The next three bytes are the ASCII values of the letters "P", "N", and "G". The last byte is a control-Z character, permitting display to stop elegantly on DOS systems if the TYPE command is used to display the file.

The Main Section

The remainder of the file consists of a series of chunks, where each chunk consists of a 4-byte chunk type, 4-byte, UNSIGNED length (not including itself or the chunk type), and the data bytes appropriate to that chunk, if any. Note that this provides for a chunk to be skipped even if the implementation does not recognize that particular chunk type. The last chunk should always be an EOF chunk.

The four-byte chunk type should consist entirely of uppercase ASCII letters, with the following exceptions:

Spaces (ascii 32) are permitted at the end in order to pad out to four bytes.

Lowercase letters are permitted if the chunk is proprietary (see below).

IMPORTANT:

Even though chunk lengths are unsigned, chunks should not exceed (2^31)-1 in size, in order to accommodate languages which do not accommodate 4-byte unsigned integers well. (1- and 2-byte unsigned integers can be accommodated by using the next larger size of integer in such languages.)

Note also that the same chunk type can appear more than once if necessary, but only if so specified in the description of the chunk. This is sometimes necessary in order to implement streaming encoders.

The chunk-ordering mechanism present in the first two drafts has been dropped. Instead, rules regarding chunk order are stated in the description of each chunk.

Ancillary and Critical Chunks

Chunks which are not strictly necessary in order to meaningfully display the contents of the file are known as "ancillary" chunks, and their names must begin with a capital "A" character.

Chunks which are critical to the successful display of the file's contents begin with any other letter.

Critical chunks are necessary in order to properly display the contents of the file. If an implementation encounters a critical chunk type it does not know how to handle, it must indicate this to the user and not display the contents of the file. The image header chunk (HEAD) is an example of a critical chunk.

A hypothetical vector-graphics chunk would also be a necessary chunk, since without rendering it the image would appear to be blank, or would contain a background bitmap with no other information.

Ancillary chunks are ancillary information that enhances the image in some fashion, but without which the image can still be successfully displayed. Examples are the comment and copyright chunks.

Proprietary Chunks

If you want others outside your organization to understand a chunk type that you invent, CONTACT THE AUTHOR OF THE PNG SPECIFICATION (boutell@netcom.com) and specify the format of the chunk's data and your preferred chunk type. The author will assign a permanent, unique chunk type. The chunk type will be publicly listed in an appendix of extended chunk types which can be optionally implemented. In the event that Mr. Boutell is unable to maintain the specification, the task will be passed on to a qualified volunteer.

If you do not require or desire that others outside your organization understand the chunk type, you may use a chunk name containing at least one lowercase character. For ancillary chunk types, begin the chunk name with a capital 'A' character. Chunk types containing lowercase letters will never be assigned in the public specification. Please note that if you want to use these chunks for information that is not essential to view the image, and have any desire whatsoever that others not using your internal viewer software be able to view the image, you should use an ancillary chunk type rather than a critical chunk type (that is, the chunk type should begin with 'A'). Also note that others may use the same proprietary prefixes, so it would be advantageous to keep additional identifying information at the beginning of the chunk.

Standard Chunks

All PNG implementations must accept the following chunk types in order to be considered PNG-compliant. All implementations must understand and successfully render the critical chunks below. Standalone image viewers should also be capable of displaying the ancillary chunks below, such as the copyright notice, but this is not necessary for applications in which many images may be displayed at once (ie, WWW browsers).

Chunk Type    Description               

HEAD          Bitmapped image header

              This chunk must appear FIRST if the file contains
              a bitmapped image.

              Width:            4 bytes
              Height:           4 bytes
              Bit depth:        1 byte
              Color type:       1 byte 
              Compression type: 1 byte
              Interlace type:   1 byte

              Width and height are 4-byte integers. Zero
              is an invalid value. The maximum for both
              is (2^31)-1 in order to accommodate languages
              which have difficulty with unsigned 4-byte values.

              Bit depth is a single-byte integer. Valid values
              that software must support are 1, 2, 4, 8, and 16.
              (Note that bit depths of 16 are easily supported on
              8-bit display hardware by dropping the least
              significant byte.)

              Color type is a single-byte integer. Valid values
              are 1, 2, 3 and 4. Color type determines the
              interpretation of the image data.

              Color Type  Valid Bit Depths  Interpretation
              1           1,2,4,8           Each pixel value is a palette 
                                            index; a palette chunk will appear

              2           1,2,4,8,16        Each pixel value is a grayscale 
                                            level, where the largest value is 
                                            white, and zero is black

              3           8,16              Each pixel value is a three-value
                                            series: red (0 = black, max = red),
                                            green (0 = black, max = green),
                                            blue (0 = black, max = blue) 

              4           8,16              Each pixel value is a four-value
                                            series: red (0 = black, max = red),
                                            green (0 = black, max = green),
                                            blue (0 = black, max = blue),
                                            alpha (0 = transparent, 
                                            max = opaque)
 
              Compression type indicates the compression scheme
              which will be used to compress the image data.

              This draft proposes use of the inflate/deflate compression 
              scheme, an LZ77 derivative which is used in zip, gzip, pkzip 
              and related programs, because extensive research has been done
              supporting its legality. Inflate and deflate code
              is available in the zip/unzip packages with a very
              permissive license (yes, permissive enough for
              commercial purposes, see those packages for details).

              At present, only compression type 0 (inflate/deflate 
              compression with a 32K sliding window) is defined. At present, 
              all standard PNG images will be compressed using this scheme.

              Interlace Type

              At present, there are two legal values for
              interlace type: 0 (no interlace) or 1
              (line-wise interlace).

              With interlace type 0, rows are laid out
              continuously from top to bottom.

              With interlace type 1, rows are stored in the 
              following order:     

              Every eighth row, starting from row 0
              Every eighth row, starting from row 4
              Every fourth row, starting from row 2
              Every second row, starting from row 1                   

              The purpose of this feature is to allow images
              to "fade in" in a simple fashion that does
              minimal damage to compression efficiency,
              although the file size is slightly expanded
              on average. 

              Other interlace types have been proposed, and will
              replace this scheme in the final proposal if the gain 
              in visual quality is sufficient to outweigh any compression 
              penalties.

AGMA          Gamma Correction

              Gamma correction factor: 2 bytes

              The gamma correction chunk  specifies the gamma of the
              device which created the image, and for which the
              color values are intended. If the encoder does not
              know the gamma value, it should not
              write a gamma chunk; the absence of a gamma chunk
              indicates the gamma is unknown. If the gamma chunk
              does appear, it must precede the PLTE chunk.

              If it is possible for the encoder to determine the gamma,
              or to make a strong guess based on the hardware on which it 
              runs, then the encoder is strongly encouraged to output
              the AGMA chunk.

              The gamma function determines the true response of the video 
              display to a given level, assuming that input levels have
              been normalized to a range between 0.0 and 1.0:

              brightness = inputLevel ^ gamma

              A value of 1000 is equivalent to a gamma of 1.0, a value 
              of 2000 to a gamma of 2.0, and so on (divide by 1000.0).
              
              Thus, when writing an image display program,
              if the display hardware has a gamma
              value of 2.0 (2000), and the gamma specified in
              the gamma correction chunk for a particular image is 
              3.0 (3000), then color and grayscale levels should ideally
              be normalized to a range between 0.0 and 1.0,
              then converted according to the following function:

              nativeLevel = inputLevel ^ (inputGamma / nativeGamma) 

              Where inputLevel is the level specified for that
              pixel in the PNG file, inputGamma is the gamma specified in 
              the PNG file, and nativeGamma is the gamma of the
              actual display to be used. 

              In practice, it is often difficult to determine
              the gamma of the actual display. It is common to
              assume a gamma of 2.2 (or 1.0, on hardware for
              which this value is common) and allow the user to
              modify this value at their option. 

              Also note that it is not difficult to calculate a gamma
              conversion table; it is *not* necessary to 
              perform transcendental math for every pixel!

              Although viewers are strongly encouraged to 
              implement gamma correction, in some cases speed
              may be a concern. In these cases, viewers are
              encouraged to provide gamma correction tables for
              gamma values of 1.0 and 2.2, and to use the table
              closest to the gamma indicated in the file. 

PLTE          Palette

              This chunk must appear for color type 1, and
              may appear for color types 3 and 4. If this chunk
              does appear, it must precede the first IDAT chunk.

              In the case of color types 3 and 4, the palette chunk is 
              optional, and provides a recommended set of from 1 to 256 
              colors to which the true-color image should be quantized if 
              the display hardware cannot display truecolor directly. 
              If it is not present, the viewer must select colors on its own,
              but it is most efficient for this to be done once by
              the encoder. 

              The number of palette entries varies from 1 to 256.
              For chunk type 1, the number of entries should not
              exceed the range that can be represented by the
              bit depth (for example, 2^4 = 16 for a bit depth of 4).
              Note that this does NOT mean that there have to
              be a full 16 entries. The length of the chunk is used
              to determine the number of entries.

              For color type 1, each palette entry consists of a 
              three-byte series:

                     red (0 = black, 255 = red),
                     green (0 = black, 255 = green),
                     blue (0 = black, 255 = blue),

              Image creation programs are strongly encouraged
              to place colors which the artist or algorithm
              regards as important first in the palette, when
              such information is available, in order
              to allow display hardware with a limited supply of 
              colors to make intelligent compromises.

              For color types 3 and 4, in which the palette is
              optional and only a suggested quantization, 
              the same exact format is used, again with
              3 bytes per palette entry:

                     red (0 = black, 255 = red),
                     green (0 = black, 255 = green),
                     blue (0 = black, 255 = blue)

              Note that the palette uses 8 bits (1 byte) per value 
              regardless of the image bit depth specification.
              In particular, the palette is 8 bits deep even when it is 
              a suggested quantization of a 16-bit truecolor image.

ATNS          Transparency. Transparency is a simple alternative to
              the full truecolor alpha channel which does not
              compromise compression.
              
              For color type 1:
              Transparent index into palette 
              (1 byte, range: 0 - (size of palette-1) )
              Any value outside the size of the palette is an error.  Note
              that the size of the palette is determined by the size of
              the palette chunk (and thus the number of three-byte entries
              in it), and not by the bit depth. 

              For color type 2:
              Transparent gray level (2 bytes, range: 0 - (2^bitdepth - 1)) 

              For color type 3:
              Transparent RGB color (6 bytes, 2 bytes for
              red, green and blue components, range for each:
              0 - (2^bitdepth - 1))

              The transparency chunk, when present, specifies a
              specific palette entry, grayscale level or
              RGB color which should be regarded as transparent. 
              Although transparency is not as elegant as the full
              alpha channel of color type 4, transparency does not adversely 
              affect the compression of the image. 

              When present, the ATNS chunk must precede
              the first IDAT chunk, and follow the
              PLTE chunk, if any.

ABGD          Background color. 

              When displaying the image in a
              stand-alone viewer, it is useful to specify the
              background color against which the image is
              intended to appear.

              For color type 1:
              Background index into palette 
              (1 byte, range: 0 - (size of palette-1) )

              For color type 2:
              Background gray level (2 bytes, range: 0 - (2^bitdepth - 1)) 

              For color type 3:
              Background RGB color (6 bytes, 2 bytes for
              red, green and blue components, range for each:
              0 - (2^bitdepth - 1))

              When present, the ABGD chunk must precede
              the first IDAT chunk, and follow the
              PLTE chunk, if any.

ACPY          Copyright notice. The notice will consist of
              ISO 8859-1 (LATIN-1) text and will not be null-terminated.
              New lines should be denoted by a single
              line feed (10 decimal). If this chunk appears,
              it must appear prior to the IDAT chunk.

ACMT          Comment. The comment will consist of
              ISO 8859-1 (LATIN-1) text and will not be null-terminated.
              New lines should be denoted by a single
              line feed (10 decimal). If this chunk appears,
              it must appear prior to the IDAT chunk. Several
              ACMT chunks may appear, and are distinct comments,
              not a continuous text.

APHY          Physical pixel dimensions.
              4 bytes: pixels per unit, X axis (unsigned integer)
              4 bytes: pixels per unit, Y axis (unsigned integer)
              1 byte: unit specifier

              The following values are legal for the unit specifier:
              0: units unknown (aspect ratio only)
              1: unit is the decimeter (10 centimeters)
              2: unit is the foot (12 inches) 

              Large units are employed to ensure sufficient
              resolution. If this ancillary chunk is not present,
              pixels are assumed to be square, and the physical
              size of each pixel is unknown. (Conversion note: one inch
              is equal to 2.54 centimeters.)

APRI          Physical image location for printing purposes.
              
              4 bytes: image position in microns (X axis)
              4 bytes: image position in microns (Y axis)

              The position on a printed page at which the image
              should be output when printed alone.

ATME          Time of image creation.

              4 bytes: time in seconds since the beginning of
              January 1st, 1970, Greenwich Mean Time. 

ATMB          Thumbnail image. This chunk contains an additional,
              complete PNG data stream, from the six-character header to
              the EOF chunk, with the constraint that the
              enclosed stream should not include another ATMB chunk.
              The PNG stream should describe a much smaller version
              of the same image, suitable for icon or catalog use. 

              If the ATMB chunk appears, it should appear prior
              to the IDAT chunk.

              Since the ATMB chunk is a complete PNG stream in its
              own right, it can easily be extracted and transmitted
              independently by packages such as web servers, and it can 
              also be palette-based even if the complete image is a
              truecolor image.

              Note that the entire thumbnail must fit in a single
              ATMB chunk; this is intentional as thumbnails are
              intended to be much smaller than the full image.

IDAT          Image data.

              The image data will be compressed using the
              compression scheme indicated by the compression
              type field of the HEAD chunk.

              IMPORTANT: the compressed image data is the concatenation
              of the contents of ALL the IDAT chunks. (If there are
              multiple IDAT chunks, they will always appear
              sequentially.) Viewers must be able to interpret such chunks. 
              (Simply speaking, the viewer knows it is not finished until it 
              has read as many pixels as are indicated by the
              image dimensions in the HEAD chunk.) This rule
              exists to permit encoders to work in a fixed
              amount of memory by outputting multiple chunks.

              The following text describes the uncompressed
              data stream which will be fed to the compressor
              or received from the decompressor.

              Pixels are always laid out left to right in 
              each row, and rows are arranged from
              top to bottom, except as modified by
              the interlace type field of the HEAD chunk.
  
              Color types 1 and 2

              For color type 1, each pixel value is an index into the 
              palette indicating which color in the palette should be
              displayed at that location. For color type 2 (grayscale),
              each pixel value is a grayscale level, where the maximum
              value representable by the bit depth is white.  
              
              For 1-bit images, each horizontal line of pixels is represented
              by a stream of bits, in which bit 7 (128) is the
              leftmost pixel in the byte and bit 0 (1) is the
              rightmost. Consecutive lines may share bits if the
              pixels in the line do not fit evenly into bytes.
              That is, if the last pixel of the line falls
              in bit 4 of a byte, the first pixel of the next
              line is stored in bit 3 of the same byte.

              For 2-bit images, the same scheme is followed, except that
              each pixel is represented by a 2-bit portion
              of a byte, with the leftmost bit being most
              significant. For instance, the first pixel
              of the line is represented by bits 7 (128) and 
              6 (64) of the byte. Consecutive lines may share bytes.
        
              For 4-bit images, the same scheme is followed, except 
              that each pixel is represented by a 4-bit portion
              of a byte, with the leftmost bit being most
              significant. For instance, the first pixel
              of the line is represented by bits 7 (128),
              6 (64), 5 (32) and 4 (16) of the byte. 
              Consecutive lines may share bytes.

              For 8-bit images, each pixel is represented by a single 
              byte. For 16-bit grayscale images (color type 2),
              each pixel is represented by a two-byte unsigned integer.

              IMPORTANT:

              For 8- and 16-bit grayscale images (color type 2, bit depth
              of 8 or 16), the values are next input to the CROSS filter 
              (for non-interlaced images; see below) or to the SUB filter 
              (for interlaced images; see below) in order to improve 
              compression before being input to the compressor itself. 
              This step is NOT employed for palette color images 
              (color type 1).

              Color types 3 and 4

              For color type 3, each pixel is represented by
              a red value, a green value, and a blue value,
              8 or 16 bits apiece respectively depending
              on the bit depth (8 or 16). For color type 4,
              an additional alpha (opacity) value of the
              same depth is added for each pixel.
              
              IMPORTANT:

              The values are next input to the CROSS filter 
              (for non-interlaced images; see below) or to the SUB filter 
              (for interlaced images; see below) in order to improve 
              compression before being input to 
              the compressor itself. 

EOF           End of File

              CRC (4 bytes)

              The EOF  chunk must appear last in the PNG file.
              Note that the letters EOF are followed by a
              space (decimal 32).

              The EOF chunk contains a 4-byte CRC
              (Cyclical Redundancy Check) of all preceding
              bytes in the file, including the identifying
              header, all preceding chunks, and the EOF
              chunk name and length field. The CRC is NOT optional.
              
              If the CRC does not match that calculated by
              the viewer, the viewer may elect to attempt to 
              display the contents of the file, but must warn the user
              that the checksum is incorrect. This mechanism
              helps to detect images that have been improperly
              transmitted.

5. Details of Specific Algorithms

Inflate and Deflate

See the zip/unzip package, which includes source code for both purposes in the files inflate.c and deflate.c, with a very permissive license. Documentation of the compression scheme is also available; see the zip/unzip package for references. (zip/unzip and pkzip are compatible but not identical. pkzip is commercial software.)

A formal, detailed specification of inflate and deflate will be included in the final standard, and is being written at this time. The formal specification will be compatible with the format defined by the inflate.c/deflate.c code.

The Sub Filter

The sub filter is used to improve compression on interlaced truecolor images (color types 3 and 4) and 8- and 16-bit grayscale images (color type 2).

For each pixel, output the difference between that pixel and the previous pixel, modulo the range possible in that bit depth. For instance, for a bit depth of 8, if the previous pixel was 16 and the current pixel is 64, store 48. If the previous pixel was 255 and the current pixel is 20, store 21. Note that unsigned addition is used. IMPORTANT: At the start of each line, consider the previous pixel value to be zero.

The Cross Filter

The cross filter is used to improve compression on non-interlaced truecolor images (color types 3 and 4) and 8- and 16-bit grayscale images (color type 2). Cross is similar to sub, but takes the previous line into account (highly effective as long as the image is not interlaced).

Output the following value, using unsigned modulo arithmetic and integers of the size appropriate to the bit depth (8 or 16):

Pixel[x][y] - Pixel[x-1][y] - Pixel[x][y-1] + Pixel[x-1][y-1]

for each channel (red, green, blue, and sometimes alpha) of each pixel.

On the first row of the image, the previous row is considered to have contained only zeroes. On the first pixel of each row, the previous pixel is considered to have contained only zeroes.

To reverse the effect of the cross filter after decompression, output the following value:

CrossedValue + Pixel[x-1][y] + Pixel[x][y-1] - Pixel[x-1][y-1]

storing the result as the value of the previous pixel for use in uncrossing subsequent pixels.

The Alpha Channel

Standalone image viewers can ignore the alpha channel, provided that they properly skip over it in order to be in the right position to read the next pixel. However, if the background color has been set with the ABGD chunk, the alpha channel can be meaningfully interpreted with respect to it even in a standalone image viewer.

World Wide Web browsers and the like should regard any pixel with an alpha channel value of zero as transparent (the pixel should be given the background color of the browser), and any pixel with the maximum alpha channel value for that bit depth as opaque (not blending with the background at all).

Viewers which are not in a position to smoothly combine foreground and background colors should regard any nonzero alpha channel value as fully opaque (fully foreground color).

For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the ATNS transparency chunk is also available.

6. Pronounciation

PNG is pronounced "ping".

End of PNG Specification

Thomas Boutell's home page